Introduction

This project aims to provide a predictive model of home prices for the Davidson County, Tennessee. Be able to predict home prices and realize the features that add or diminish the value to a home are critical not only to homeowners and buyers but also essential to the housing market. The existing housing market predictive models that Zillow built for the County seem lack of considerations of local factors, which may lead to the reduction of accuracy to some of the predictions. Therefore, our goal is to generate a reliable model that taken the local intelligence into consideration for Zillow and help to provide robust data for its clients.

The data wrangling and feature engineering processes are challenging for this project. The limited open data sources available for the Davidson County makes the data collection difficult to proceed. In addition, it is also hard to identify whether the data resource is reliable and trustworthy. Finding the features that best explain the home price values and its spatial patterns is the key to build a robust and effective model. The process building up new features based on datasets we obtained from the internet not only requires the background in the local housing market but a lot of attempts and tryouts.

The overall modeling strategy is based on the statistical tools we learnt in class. With the dataset gathered from online resources and provided by the instructor, we built a multiple linear regression model explaining the existing sale price and train the model within the training dataset. We then use the model to predict the unknown sale prices. We are able to understand the performance of our model by examining the amount of errors between predicted and observed sale prices.


Data

Data Collection and Feature Engineering

Besides the dataset given by the instructor, we obtained the additional data from the Nashville Open Data Portal primarily. Census data such as the number of individuals living in poverty, number of individuals with a bachelor’s degree, and so on were downloaded through the package “tidycensus” from the Census API. We also gathered data from Google API for grocery stores, retail stores, universities, and clinics. In the feature engineering process, we used our domain knowledge to create new features that are relevant to the dependent variable, Sale Price, we are predicting for.

Summary Statistics

Variable Dictionary

  1. SalePrice — Davidson County Home Price.

  2. LocationCity — The City name.

  3. LocationZip — Zip code.

  4. CensusBlock — Census block number.

  5. LandUseFullDescription — Land use.

  6. neighjud — Dummy variable within or outside the neighborhood district.

  7. num_vacant_unit — Number of vacant units.

  8. WhiteAlone — Number of household individuals claim themselves as white.

  9. Poverty — Individuals below 100% poverty level.

  10. Unemploy — Unemployed individuals.

  11. BachDegree — Individuals with a bachelor’s degree.

  12. Parks — Within or outside the 0.25 mile buffer of the park.

  13. d_prisecroads — Distance to primary and secondary roads.

  14. d_Retail — Distance to retails.

  15. d_clinics — Distance to clinics.

  16. d_grocery — Distance to groceries.

  17. d_crime — Distance to reported aggravated crime.

  18. age — Year of built.

  19. nasimp — First two numbers of the neighborhood code used by the Assessor’s office to group similar properties for the purpose of determining property value.

  20. Acrage — Acres of land.

  21. Story_Height — Number of stories of the building.

  22. Exterior_Wall — Exterior wall type.

  23. Frame — Building frame type.

  24. units_building — the number of units for multi-family like a duplex.

  25. sf_finished — Square feet of finished area.

  26. sf_bsmt — Square feet of the basement if any.

  27. ac_sfyi — Central air. 0 = no central air; 1 = central air (Residential).

  28. Phys_Depreciation — Building condition.

  29. NumofUnits_land — Units used for appraisal.

  30. Zone_Assessor — Zones (jurisdictions). 9 large areas of the county used by the appraisal staff to coordinate appraisal teams.

  31. Land_Unit_Type — Type of units.

  32. baths — Number of baths.

  33. Fixtures — Estimated plumbing fixtures.

  34. Foundation — Type of foundation.

  35. AveSale2 — Average sale price of nearest two neighbors.

  36. AveSale5 — Average sale price of nearest five neighbors.

  37. AveSale10 — Average sale price of nearest ten neighbors.


Summary Tables


Building Characteristics
===============================================================================
Statistic         N      Mean      St. Dev.    Min  Pctl(25) Pctl(75)    Max   
-------------------------------------------------------------------------------
housing_units   8,415  2,061.680    707.295    450   1,565    2,582     4,911  
num_vacant_unit 8,415   188.027     144.506     8      92      247      1,580  
SalePrice       8,415 312,147.700 307,978.500 2,000 150,000  376,840  6,894,305
Acrage          8,415    0.209       0.321    0.000  0.000    0.270     8.160  
sf_finished     8,415  1,843.708    884.946    348   1,238    2,206    10,608  
sf_bsmt         8,415   185.892     450.246     0      0        0       3,531  
NumofUnits_land 8,415   75.530     1,825.649    0      1        1      116,741 
Fixtures        8,415    9.623       3.658      3      7        12       38    
-------------------------------------------------------------------------------

Spatial Structure
===============================================================================================
Statistic       N        Mean       St. Dev.      Min      Pctl(25)    Pctl(75)        Max     
-----------------------------------------------------------------------------------------------
CensusBlock   8,415 37,015,416.000  2,847.677  37,010,105 37,012,801  37,017,902   37,019,600  
Parks         8,415     0.135         0.342        0           0           0            1      
d_Retail      8,415     0.036         0.024      0.001       0.020       0.045        0.132    
d_clinics     8,415     0.035         0.029      0.0005      0.015       0.043        0.133    
d_grocery     8,415     0.020         0.012      0.002       0.011       0.026        0.056    
d_crime       8,415     0.002         0.001     0.00001      0.001       0.003        0.008    
Zone_Assessor 8,415     4.231         2.655        1           2           6            9      
AveSale2      8,415  311,958.600   271,086.300   10,000     157,000     374,000     4,706,372  
AveSale5      8,415  313,645.100   251,958.400   22,700     162,500    384,307.9    4,534,583  
AveSale10     8,415  316,288.100   243,272.500 37,900.000 164,540.000 388,990.000 3,597,141.000
-----------------------------------------------------------------------------------------------

Census Tract
================================================================
Statistic    N     Mean    St. Dev.  Min Pctl(25) Pctl(75)  Max 
----------------------------------------------------------------
WhiteAlone 8,415 2,843.831 1,293.914 89   2,119    3,731   6,815
Poverty    8,415  756.843   497.803  26    345     1,024   2,623
Unemploy   8,415  145.165   116.541   0     64      183     643 
BachDegree 8,415  792.530   437.939  45    438     1,154   2,050
----------------------------------------------------------------

Correlation Matrix of Variables


Dependent Variable

Home Price Map


Independent Variables

A cluster of crime can be observed according to the map below. In the central district, the distance to aggravated crime is smaller compared to the surrounding regions. The map implies that the southeastern region is relatively safer than other regions in the study areas since the distances of reported crimes to each house are larger or perhaps there are less aggravated crimes reported in this region.

Besides the clustering within each census tract, the map below indicates that the neighborhoods in the southwestern county are wealthier or have less population living under the poverty level. Neighborhoods in the northwest of the County have a higher number of individuals living in poverty. This observation somehow reflects the Distance to Nearest 5 Reported Aggravated Crime Map: the neighborhoods have less exposure to aggravated crimes also have less poverty population.

The average sale price of nearest neighbors is one of the most powerful predictors to eliminate spatial autocorrelation. This variable explains the spatial pattern of the sale prices particularly well. By comparing the map with the Davidson County Home Prices Map above, one may notice that it reflects the clustering of high and low sale prices.


Other Interesting Variables


Methods

After data collecting and cleaning, we are able to perform OLS regression, a predictive method for estimating the unknown parameter - Unknown Sale Prices, to explore the relationships between existing sale prices and those explanatory variables (i.e., predictors). In this project, we used current sale price data and the selected predictors to train our regression model within the training dataset. By having the least amount of errors between the predicted sale prices and existing sale prices, we are able to use the model to estimate the missing sale prices in the test dataset. By looking at R-square, Mean absolute percentage error (MAPE), and Mean absolute error (MAE), We are able to critic the performance of our model.


Results

This section includes visualizations of prediction result and model fit test results.

Regression Result Table


=========================================================================
                                                  Dependent variable:    
                                              ---------------------------
                                                    log(SalePrice)       
-------------------------------------------------------------------------
CensusBlock                                            0.00001**         
                                                       t = 2.292         
neighjud1                                              0.048***          
                                                       t = 4.032         
housing_units                                          0.00005*          
                                                       t = 1.766         
num_vacant_unit                                         0.00005          
                                                       t = 0.711         
WhiteAlone                                             0.00003**         
                                                       t = 2.308         
Poverty                                               -0.0001***         
                                                      t = -5.338         
Unemploy                                               0.0004***         
                                                       t = 4.877         
BachDegree                                            -0.0001***         
                                                      t = -2.744         
Parks                                                  -0.053***         
                                                      t = -3.545         
d_Retail                                               -1.744**          
                                                      t = -2.376         
d_clinics                                               -0.885           
                                                      t = -1.426         
d_grocery                                               2.294**          
                                                       t = 2.112         
d_crime                                                  8.324           
                                                       t = 1.564         
LocationCityBRENTWOOD                                  0.240***          
                                                       t = 4.858         
LocationCityMADISON                                      0.095           
                                                       t = 1.012         
LocationCityNASHVILLE                                   0.134*           
                                                       t = 1.876         
LocationCityWHITES CREEK                                -0.105           
                                                      t = -0.374         
LocationZip37027                                                         
                                                                         
LocationZip37115                                                         
                                                                         
LocationZip37189                                                         
                                                                         
LocationZip37201                                       0.403***          
                                                       t = 3.867         
LocationZip37203                                       0.412***          
                                                       t = 6.109         
LocationZip37204                                       0.323***          
                                                       t = 4.595         
LocationZip37205                                       0.352***          
                                                       t = 5.174         
LocationZip37206                                       0.411***          
                                                       t = 4.939         
LocationZip37207                                        -0.075           
                                                      t = -0.905         
LocationZip37208                                       0.225***          
                                                       t = 3.118         
LocationZip37209                                       0.369***          
                                                       t = 5.487         
LocationZip37210                                        -0.123           
                                                      t = -1.612         
LocationZip37211                                        -0.089           
                                                      t = -1.320         
LocationZip37212                                       0.305***          
                                                       t = 4.095         
LocationZip37214                                        -0.067           
                                                      t = -0.811         
LocationZip37215                                       0.311***          
                                                       t = 4.480         
LocationZip37216                                       0.394***          
                                                       t = 4.776         
LocationZip37217                                       -0.237***         
                                                      t = -3.125         
LocationZip37218                                         0.094           
                                                       t = 0.885         
LocationZip37219                                        0.392**          
                                                       t = 2.167         
LocationZip37220                                       0.230***          
                                                       t = 2.739         
LocationZip37221                                                         
                                                                         
LandUseFullDescriptionRESIDENTIAL COMBO/MISC            -0.174           
                                                      t = -0.342         
LandUseFullDescriptionRESIDENTIAL CONDO                  0.128           
                                                       t = 0.539         
LandUseFullDescriptionSINGLE FAMILY                      0.130           
                                                       t = 0.548         
LandUseFullDescriptionVACANT RESIDENTIAL LAND           0.551*           
                                                       t = 1.884         
LandUseFullDescriptionZERO LOT LINE                     -0.137           
                                                      t = -0.568         
nasimp2                                                -0.487***         
                                                      t = -6.204         
nasimp3                                                -0.825***         
                                                      t = -5.526         
nasimp7                                                -1.153**          
                                                      t = -2.542         
nasimp10                                                -0.089           
                                                      t = -1.086         
nasimp11                                               -0.436***         
                                                      t = -5.557         
nasimp12                                               -0.202***         
                                                      t = -2.814         
nasimp13                                                0.278*           
                                                       t = 1.666         
nasimp14                                                0.193*           
                                                       t = 1.755         
nasimp16                                                -0.061           
                                                      t = -0.827         
nasimp17                                                -0.440           
                                                      t = -1.038         
nasimp19                                               -0.269***         
                                                      t = -2.593         
nasimp20                                               0.249***          
                                                       t = 2.917         
nasimp21                                                -0.146*          
                                                      t = -1.742         
nasimp22                                                -0.025           
                                                      t = -0.304         
nasimp23                                                -0.118           
                                                      t = -1.455         
nasimp24                                                -0.132           
                                                      t = -1.503         
nasimp25                                                -0.094           
                                                      t = -1.172         
nasimp26                                               -0.188**          
                                                      t = -2.304         
nasimp27                                               -0.185**          
                                                      t = -2.132         
nasimp30                                               -0.216***         
                                                      t = -2.640         
nasimp31                                                -0.112           
                                                      t = -1.615         
nasimp32                                               -0.163**          
                                                      t = -2.315         
nasimp33                                               -0.375***         
                                                      t = -5.664         
nasimp34                                                 0.036           
                                                       t = 0.423         
nasimp35                                               -0.461***         
                                                      t = -5.223         
nasimp36                                               -0.263***         
                                                      t = -3.413         
nasimp37                                                -0.088           
                                                      t = -1.254         
nasimp38                                                -0.035           
                                                      t = -0.482         
nasimp39                                               -0.226***         
                                                      t = -3.258         
nasimp40                                                -0.047           
                                                      t = -0.710         
nasimp41                                               -0.147**          
                                                      t = -2.155         
nasimp42                                                -0.064           
                                                      t = -1.032         
nasimp43                                               -0.180***         
                                                      t = -2.720         
nasimp44                                               -0.295***         
                                                      t = -3.939         
nasimp48                                               -0.329***         
                                                      t = -4.284         
nasimp49                                                -0.112           
                                                      t = -0.692         
nasimp60                                                 0.012           
                                                       t = 0.174         
nasimp62                                                0.148*           
                                                       t = 1.922         
nasimp63                                               -0.129**          
                                                      t = -2.121         
nasimp64                                               -0.242***         
                                                      t = -3.417         
nasimp67                                               -0.173**          
                                                      t = -2.381         
nasimp69                                               -0.314***         
                                                      t = -3.019         
nasimp72                                                -0.346           
                                                      t = -0.857         
nasimp73                                               -0.335***         
                                                      t = -4.097         
nasimp92                                                -0.154           
                                                      t = -0.765         
nasimp93                                                -0.127           
                                                      t = -0.427         
Acrage                                                 0.124***          
                                                       t = 5.586         
Story_Height1.25 STORY                                   0.069           
                                                       t = 1.143         
Story_Height1.5 STORY                                    0.013           
                                                       t = 0.540         
Story_Height1.75 STORY                                   0.023           
                                                       t = 0.897         
Story_Height2 STORY                                     -0.018           
                                                      t = -1.138         
Story_Height2.25 STORY                                  -0.134           
                                                      t = -1.573         
Story_Height2.5 STORY                                   -0.004           
                                                      t = -0.044         
Story_Height2.75 STORY                                 -0.230**          
                                                      t = -2.208         
Story_Height3 STORY                                      0.008           
                                                       t = 0.242         
Story_Height4 STORY                                     -0.294           
                                                      t = -1.240         
Story_HeightBI-LEVEL                                     0.062           
                                                       t = 1.107         
Story_HeightCOM 3 STY                                    0.026           
                                                       t = 0.064         
Story_HeightCOM 4 STY                                    0.555           
                                                       t = 1.360         
Story_HeightSPLIT LEVEL                                  0.073           
                                                       t = 1.602         
Exterior_WallBRICK/FRAME                               -0.059***         
                                                      t = -3.747         
Exterior_WallCONC BLK                                    0.009           
                                                       t = 0.100         
Exterior_WallFRAME                                     -0.091***         
                                                      t = -6.466         
Exterior_WallFRAME/STONE                               -0.177**          
                                                      t = -2.091         
Exterior_WallLOG                                        -0.183           
                                                      t = -0.432         
Exterior_WallMETAL                                       0.159           
                                                       t = 1.025         
Exterior_WallSTONE                                       0.011           
                                                       t = 0.227         
Exterior_WallSTUCCO                                    -0.125***         
                                                      t = -3.260         
FrameRESD FRAME                                        0.232***          
                                                       t = 6.203         
FrameRESD FRAME                                        0.172***          
                                                       t = 7.140         
FrameTYPICAL                                           0.188***          
                                                       t = 8.998         
age1830                                                1.634***          
                                                       t = 2.828         
age1860                                                1.994***          
                                                       t = 3.450         
age1890                                                1.327***          
                                                       t = 3.155         
age1900                                                1.364***          
                                                       t = 3.267         
age1910                                                1.294***          
                                                       t = 3.147         
age1920                                                1.291***          
                                                       t = 3.170         
age1930                                                1.324***          
                                                       t = 3.252         
age1940                                                1.295***          
                                                       t = 3.184         
age1950                                                1.267***          
                                                       t = 3.118         
age1960                                                1.214***          
                                                       t = 2.987         
age1970                                                1.170***          
                                                       t = 2.877         
age1980                                                1.271***          
                                                       t = 3.126         
age1990                                                1.331***          
                                                       t = 3.272         
age2000                                                1.398***          
                                                       t = 3.438         
age2010                                                1.483***          
                                                       t = 3.645         
units_building1                                         -0.003           
                                                      t = -0.030         
units_building2                                          0.012           
                                                       t = 0.045         
units_building4                                         -0.460           
                                                      t = -1.085         
units_building11                                         0.379           
                                                       t = 0.899         
sf_finished                                            0.0001***         
                                                      t = 11.280         
sf_bsmt                                                -0.00000          
                                                      t = -0.165         
ac_sfyi1                                                -0.054           
                                                      t = -1.492         
Phys_DepreciationDilapidated                           -0.433**          
                                                      t = -2.129         
Phys_DepreciationExcellent                               0.195           
                                                       t = 0.675         
Phys_DepreciationFair                                  -0.144***         
                                                      t = -3.640         
Phys_DepreciationGood                                   0.134**          
                                                       t = 2.178         
Phys_DepreciationPoor                                  -0.293***         
                                                      t = -3.618         
Phys_DepreciationVery Good                              0.345**          
                                                       t = 2.026         
Phys_DepreciationVery Poor                             -0.337**          
                                                      t = -2.474         
NumofUnits_land                                         0.00000          
                                                       t = 1.505         
Zone_Assessor                                           -0.007           
                                                      t = -1.313         
Land_Unit_TypeN NASHVILLE RPDLND                        0.939*           
                                                       t = 1.917         
Land_Unit_TypeOH MAD RG RPDLND                           0.410           
                                                       t = 0.820         
Land_Unit_TypePRIME SF                                   0.288           
                                                       t = 0.609         
Land_Unit_TypeR PRIME AC                                 0.470           
                                                       t = 0.995         
Land_Unit_TypeR PRIME SF                                 0.540           
                                                       t = 1.281         
Land_Unit_TypeR RESID`L SF                               0.564           
                                                       t = 1.235         
Land_Unit_TypeR SITE VAL                                 0.476           
                                                       t = 1.169         
Land_Unit_TypeR SITE VAL RESD SITE VALUE                 0.442           
                                                       t = 1.083         
Land_Unit_TypeR UNDVL SF                                0.932**          
                                                       t = 1.964         
Land_Unit_TypeRPDLND                                     0.265           
                                                       t = 0.459         
Land_Unit_Types                                          0.248           
                                                       t = 0.430         
Land_Unit_TypeVNDY HBVLG RPDLND                          0.538           
                                                       t = 0.926         
baths1                                                  -0.109           
                                                      t = -0.379         
baths2                                                   0.027           
                                                       t = 0.092         
baths3                                                   0.008           
                                                       t = 0.027         
baths4                                                  -0.133           
                                                      t = -0.456         
baths5                                                  -0.315           
                                                      t = -1.062         
baths6                                                  -0.308           
                                                      t = -0.969         
baths7                                                  -0.606*          
                                                      t = -1.729         
baths8                                                 -3.045***         
                                                      t = -5.947         
Fixtures                                                -0.001           
                                                      t = -0.274         
FoundationCRAWL                                         -0.046           
                                                      t = -0.112         
FoundationFULL BASEMENT                                 -0.015           
                                                      t = -0.035         
FoundationPART BASEMENT                                 -0.025           
                                                      t = -0.060         
FoundationPIERS                                          0.210           
                                                       t = 0.482         
FoundationSLAB                                          -0.088           
                                                      t = -0.217         
FoundationTYPICAL                                       -0.041           
                                                      t = -0.071         
AveSale2                                              0.00000***         
                                                      t = 35.396         
AveSale5                                              0.00000***         
                                                       t = 8.099         
AveSale10                                             -0.00000***        
                                                      t = -11.936        
Constant                                              -428.107**         
                                                      t = -2.241         
-------------------------------------------------------------------------
Observations                                             8,415           
R2                                                       0.710           
Adjusted R2                                              0.704           
Residual Std. Error                                0.404 (df = 8242)     
F Statistic                                   117.351*** (df = 172; 8242)
=========================================================================
Note:                                         *p<0.1; **p<0.05; ***p<0.01

Table of R squared, MAE and MAPE for the Test Set

Index R Squared MAE MAPE
Value 0.723 106279.974 0.351

Cross Validation

The mean R squared of the cross validation result is: 0.788 
The standard deviation of R squared of the cross validation result is: 0.129 

According to the histogram of R Squared, the mdoel is not overfitting.

Predicted Prices V.S. Observed Prices

Residual Map and Moran’s I of the Randomly Selected Test Set

From the visual access of the Regression Residuals Map, the residuals are randomly distributed throughout the study area. According to the p-value of Moran’s I test, the residuals of the test set are randomly distributed across the map, meaning that this model has solved the problem of spatial auto-correlation pretty well.


Predicted Values for The Entire Dataset

According to the prediction result map above, higher home prices are expected in the southwest of the study area; the northern and southeastern parts of the study region have lower predicted home prices.

MAPE by Zip for The Test Set

The MAPE by Zip for the Test Set result shows that the model performs better in the southern part of the study region (yellow color), and it is biased towards the northern and southeastern regions (dark blue), which have relatively lower predicted home prices.

MAPE by Zip Code V.S. Mean Price by Zip Code


Spatial Cross-validation

In this section, an attempt to perform a spatial cross-validation has been made to measure the generalizability of the model accross neighborhoods with different level of income.

Code for identifying Wealthy, Middle Income and Low Income Neighborhood

#Wealthy Neighborhood
richnum=5
richset<-rall_1%>%filter(neigh==richnum)
withoutrichset<-rall_1%>%filter(neigh!=richnum)
regwithoutrich<-lm(log(SalePrice)~.,data=withoutrichset%>%select(-kenID,-test,-longitude,-latitude,-neigh))
richval<-predict(regwithoutrich,richset)
richbind<-cbind('obspr'=richset$SalePrice,'logpr'=richval)%>%as.data.frame()%>%
  mutate(predpr=exp(richval),
         richness='Rich')
mapeofrich=mean(abs(richbind$obspr-richbind$predpr)/richbind$obspr)
maeofrich=mean(abs(richbind$obspr-richbind$predpr))

#Middle-Income Neighborhoods
mediannum=46
medset<-rall_1%>%filter(neigh==mediannum)
withoutmedset<-rall_1%>%filter(neigh!=mediannum)
regwithoutmed<-lm(log(SalePrice)~.,data=withoutmedset%>%select(-kenID,-test,-longitude,-latitude,-neigh))
medval<-predict(regwithoutmed,medset)
medbind<-cbind('obspr'=medset$SalePrice,'logpr'=medval)%>%as.data.frame()%>%
  mutate(predpr=exp(medval),
         richness='Middle-income')
mapeofmed=mean(abs(medbind$obspr-medbind$predpr)/medbind$obspr)
maeofmed=mean(abs(medbind$obspr-medbind$predpr))


#Low-Income Neighborhoods
poornum=266
#rank no.161 out of 181
poorset<-rall_1%>%filter(neigh==poornum)
withoutpoorset<-rall_1%>%filter(neigh!=poornum)
regwithoutpoor<-lm(log(SalePrice)~.,data=withoutpoorset%>%select(-kenID,-test,-longitude,-latitude,-neigh))
poorval<-predict(regwithoutpoor,poorset)
poorbind<-cbind('obspr'=poorset$SalePrice,'logpr'=poorval)%>%as.data.frame()%>%
  mutate(predpr=exp(poorval),
         richness='Poor')
mapeofpoor=mean(abs(poorbind$obspr-poorbind$predpr)/poorbind$obspr)
maeofpoor=mean(abs(poorbind$obspr-poorbind$predpr))

MAPE and MAE Table

Selected rich, median-income, and poor neighborhoods

Richness MAPE MAE
Rich 0.202 116114
Middle 0.438 74581
Poor 0.356 32493

Scatterplots of predicted V.S. observed for the selected rich, median-income, and poor neighborhoods

The result indicates that the model performs better in wealthy neighborhoods, moderately in poor neighborhoods and poorly in middle-income neighborhoods. However, the result is influenced by the number of observations in the training set and test set. Overall, the model is biased towards the lower-income neighborhoods.


Discussion

Our predictive model performs moderately according to the results. The most interesting variables include the average distance to aggravated crimes, the number of individuals living in poverty by tract, and the average sale prices of nearest 2, 5 and 10 neighbors. The previous two variables explain the home prices’ variation regarding external influences and social-economic characteristics. The latter three variables derived directly from the dependent variable sale price are highly significant and well explained the spatial patterns of home prices variation. According to the Regression Result Table, the variables nasimp and age are also highly associated with the sale price. The finalized model we built has an R-square value of 0.788, meaning that about 79% of the variation of sale prices can be explained by the model. The mean absolute percent error (MAPE) is 0.356, meaning that the prediction is off by 35.6%. From the regression residual map, the residuals are randomly distributed throughout the study area. And the result of Moran’s I test justified the observation that the model has solved the spatial autocorrelation problem well. The result indicates that the model predicts particularly well in the southern part but performs poorly in the southeastern region. One of the reasons might be there are other contributing factors of the southeast region that we have not taken into consideration. Variables such as median household income may explain the variation of buying power among different neighborhoods.


Conclusion

We may not recommend our model to Zillow, because the performance of our model is not outstanding, which could lead to wrong expectations and frustration of clients. We could continually improve the accuracy and generalizability by further building up our domain knowledge and including more variables associated with home prices.